[SPARK-43341][SQL] Patch StructType.toDDL not picking up on non-nullability of nested column#41016
Closed
BramBoog wants to merge 14 commits intoapache:masterfrom
Closed
[SPARK-43341][SQL] Patch StructType.toDDL not picking up on non-nullability of nested column#41016BramBoog wants to merge 14 commits intoapache:masterfrom
BramBoog wants to merge 14 commits intoapache:masterfrom
Conversation
Member
|
cc @MaxGekk FYI |
Author
|
Hmm yeah that was strange, I should have noticed that immediately :/. Seems GitHub had added a whole list of commits which already were on master to the diff. Changing the base to a different branch and back to master has fixed it though. |
|
Hey guys, any chance to have the PR merged? Although not critical it would simplify our schema tests a lot. |
Author
|
@zaza Yeah the way I see it the PR is ready, I've just been waiting for a review |
Member
|
@BramBoog it has a conflicts against the lastest master branch. You would need to resolve the conflicts by git fetch upstream & git rebase upstream/master |
Author
|
@HyukjinKwon Apologies, it took a while for me to find the time to get back to this. Resolved the conflicts, PR is up to date again. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
When converting a StructType instance containing a nested StructType column which in turn contains a column for which
nullable = falseto a DDL string using.toDDL, the resulting DDL string does not include this non-nullability. For example:gives
This is due to the fact that
StructType.toDDLcallsStructField.toDDLfor its fields, which in turn calls.sqlfor itsdataType. IfdataTypeis a StructType, the call to.sqlin turn calls.sqlfor all the nested fields, and this last method does not include the nullability of the field in its output. The proposed solution is therefore to haveStructField.toDDLcalldataType.toDDLfor a StructType, since this will include information about nullability of nested columns.To work around the different DDL formats of nested and non-nested structs (the former is wrapped in
"STRUCT ...>"and usescolName: dataTypefor its fields instead ofcolName dataType), package-private nested-specific versions of.toDDLhave been added for StructType and StructField.Why are the changes needed?
Currently, converting a StructType schema to a DDL string does not pass information about nullability of nested columns. This leads to a loss of information, and means converting to DDL and then back could alter the StructType schema.
Does this PR introduce any user-facing change?
Yes, given the example above, the output will become:
How was this patch tested?
In
StructTypeSuite, thenestedStructtesting value has been modified to include a non-nullable nested column. The relevant unit tests have been changed accordingly.